expert prompt
MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper
Zeng, Runjia, Sun, Guangyan, Wang, Qifan, Geng, Tong, Dianat, Sohail, Han, Xiaotian, Rao, Raghuveer, Zhang, Xueling, Han, Cheng, Huang, Lifu, Liu, Dongfang
Considering deep neural networks as manifold mappers, the pretrain-then-fine-tune paradigm can be interpreted as a two-stage process: pretrain establishes a broad knowledge base, and fine-tune adjusts the model parameters to activate specific neural pathways to align with the target manifold. Although prior fine-tuning approaches demonstrate success, their rigid parameter space limits their ability to dynamically activate appropriate neural pathways, rendering them ill-equipped to adapt flexibly to the diverse and evolving data distributions. In light of this view, we propose a novel approach, Mixture of Expert Prompt Tuning (MEPT), as an effective and efficient manifold-mapping framework. MEPT leverages the Mixture of Experts architecture by integrating multiple prompt experts to adaptively learn diverse and non-stationary data distributions. Empirical evaluations demonstrate that MEPT outperforms several state-of-the-art parameter efficient baselines on SuperGLUE, achieving notable improvements in mean accuracy (e.g., 1.94%) while significantly reducing activated prompts by 79.25%. The effectiveness of MEPT is further supported by theoretical insights from manifold learning and validated through neural activation pathway visualization results. Our code is avaliable at https://runjia.tech/emnlp_mept/.
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Chen, Jiali, Hei, Xusen, Xue, Yuqi, Wei, Yuancheng, Xie, Jiayuan, Cai, Yi, Li, Qing
Large multimodal models (LMMs) have shown remarkable performance in the visual commonsense reasoning (VCR) task, which aims to answer a multiple-choice question based on visual commonsense within an image. However, the ability of LMMs to correct potential visual commonsense errors in the distractor upon their occurrence is yet under-explored. Drawing inspiration from how a human teacher crafts challenging distractors to test students' comprehension of the concepts or skills and assists them in identifying and correcting errors toward the answer, we are the pioneering research for LMMs to simulate this error correction process. To this end, we employ GPT-4 as a ``teacher'' to collect the explainable feedback dataset VCR-DF for error correction, which serves as a benchmark to evaluate the ability of LMMs to identify misconceptions and clarify reasons behind the error in VCR distractors toward final answers. In addition, we propose an LMM-based Pedagogical Expert Instructed Feedback Generation (PEIFG) model to incorporate the learnable expert prompts and multimodal instruction as guidance for feedback generation. Experimental results show that our PEIFG significantly outperforms existing LMMs. We believe that our benchmark provides a new direction for evaluating the capabilities of LMMs.
From Words to Workflows: Automating Business Processes
Minkova, Laura, Espejel, Jessica Lรณpez, Djaidja, Taki Eddine Toufik, Dahhane, Walid, Ettifouri, El Hassane
As businesses increasingly rely on automation to streamline operations, the limitations of Robotic Process Automation (RPA) have become apparent, particularly its dependence on expert knowledge and inability to handle complex decision-making tasks. Recent advancements in Artificial Intelligence (AI), particularly Generative AI (GenAI) and Large Language Models (LLMs), have paved the way for Intelligent Automation (IA), which integrates cognitive capabilities to overcome the shortcomings of RPA. This paper introduces Text2Workflow, a novel method that automatically generates workflows from natural language user requests. Unlike traditional automation approaches, Text2Workflow offers a generalized solution for automating any business process, translating user inputs into a sequence of executable steps represented in JavaScript Object Notation (JSON) format. Leveraging the decision-making and instruction-following capabilities of LLMs, this method provides a scalable, adaptable framework that enables users to visualize and execute workflows with minimal manual intervention. This research outlines the Text2Workflow methodology and its broader implications for automating complex business processes.
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions
Wang, Hongchen, Li, Kangming, Ramsay, Scott, Fehlis, Yao, Kim, Edward, Hattrick-Simpers, Jason
Large Language Models (LLMs) have the potential to revolutionize scientific research, yet their robustness and reliability in domain-specific applications remain insufficiently explored. This study conducts a comprehensive evaluation and robustness analysis of LLMs within the field of materials science, focusing on domain-specific question answering and materials property prediction. Three distinct datasets are used in this study: 1) a set of multiple-choice questions from undergraduate-level materials science courses, 2) a dataset including various steel compositions and yield strengths, and 3) a band gap dataset, containing textual descriptions of material crystal structures and band gap values. The performance of LLMs is assessed using various prompting strategies, including zero-shot chain-of-thought, expert prompting, and few-shot in-context learning. The robustness of these models is tested against various forms of 'noise', ranging from realistic disturbances to intentionally adversarial manipulations, to evaluate their resilience and reliability under real-world conditions. Additionally, the study uncovers unique phenomena of LLMs during predictive tasks, such as mode collapse behavior when the proximity of prompt examples is altered and performance enhancement from train/test mismatch. The findings aim to provide informed skepticism for the broad use of LLMs in materials science and to inspire advancements that enhance their robustness and reliability for practical applications.
SAM-SP: Self-Prompting Makes SAM Great Again
Zhou, Chunpeng, Ning, Kangjie, Shen, Qianqian, Zhou, Sheng, Yu, Zhi, Wang, Haishuai
The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model's practicality. To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM's applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches.
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization
Wang, Xinyuan, Li, Chenxi, Wang, Zhen, Bai, Fan, Luo, Haotian, Zhang, Jiayou, Jojic, Nebojsa, Xing, Eric P., Hu, Zhiting
Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth of domain knowledge and struggle to efficiently explore the vast space of expert-level prompts. Addressing this, we present PromptAgent, an optimization method that autonomously crafts prompts equivalent in quality to those handcrafted by experts. At its core, PromptAgent views prompt optimization as a strategic planning problem and employs a principled planning algorithm, rooted in Monte Carlo tree search, to strategically navigate the expert-level prompt space. Inspired by human-like trial-and-error exploration, PromptAgent induces precise expert-level insights and in-depth instructions by reflecting on model errors and generating constructive error feedback. Such a novel framework allows the agent to iteratively examine intermediate prompts (states), refine them based on error feedbacks (actions), simulate future rewards, and search for high-reward paths leading to expert prompts. We apply PromptAgent to 12 tasks spanning three practical domains: BIG-Bench Hard (BBH), as well as domain-specific and general NLP tasks, showing it significantly outperforms strong Chain-of-Thought and recent prompt optimization baselines. Extensive analyses emphasize its capability to craft expert-level, detailed, and domain-insightful prompts with great efficiency and generalizability.